Cluster and Feature Modeling from Combinatorial Stochastic Processes
نویسندگان
چکیده
One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.
منابع مشابه
Clusters and Features from Combinatorial Stochastic Processes
In partitioning-‐-‐-‐a.k.a. clustering-‐-‐-‐data, we associate each data point with one and only one of some collection of groups called clusters or partition blocks. Here, we formally establish an analogous problem, called feature allocation, for associating data points with arbitrary non-‐negative integer numbers of groups, now called features or topics. Just as the exchangeable partit...
متن کاملA Novel Combinatorial Approach to Discrete Fracture Network Modeling in Heterogeneous Media
Fractured reservoirs contain about 85 and 90 percent of oil and gas resources respectively in Iran. A comprehensive study and investigation of fractures as the main factor affecting fluid flow or perhaps barrier seems necessary for reservoir development studies. High degrees of heterogeneity and sparseness of data have incapacitated conventional deterministic methods in fracture network modelin...
متن کاملA Useful Family of Stochastic Processes for Modeling Shape Diffusions
One of the new area of research emerging in the field of statistics is the shape analysis. Shape is defined as all the geometrical information of an object whose location, scale and orientation is not of interest. Diffusion in shape analysis can be studied via either perturbation of the key coordinates identifying the initial object or random evolution of the shape itself. Reviewing the f...
متن کاملA Statistical Study of two Diffusion Processes on Torus and Their Applications
Diffusion Processes such as Brownian motions and Ornstein-Uhlenbeck processes are the classes of stochastic processes that have been investigated by researchers in various disciplines including biological sciences. It is usually assumed that the outcomes of these processes are laid on the Euclidean spaces. However, some data in physical, chemical and biological phenomena indicate that they cann...
متن کامل